首页> 外文OA文献 >Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication
【2h】

Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication

机译:外部乘积并行稀疏矩阵矩阵乘法的同时输入和输出矩阵划分

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

For outer-product-parallel sparse matrix-matrix multiplication (SpGEMM) of the form C=A×B, we propose three hypergraph models that achieve simultaneous partitioning of input and output matrices without any replication of input data. All three hypergraph models perform conformable one-dimensional (1D) columnwise and 1D rowwise partitioning of the input matrices A and B, respectively. The first hypergraph model performs two-dimensional (2D) nonzero-based partitioning of the output matrix, whereas the second and third models perform 1D rowwise and 1D columnwise partitioning of the output matrix, respectively. This partitioning scheme induces a two-phase parallel SpGEMM algorithm, where communication-free local SpGEMM computations constitute the first phase and the multiple single-node-accumulation operations on the local SpGEMM results constitute the second phase. In these models, the two partitioning constraints defined on weights of vertices encode balancing computational loads of processors during the two separate phases of the parallel SpGEMM algorithm. The partitioning objective of minimizing the cutsize defined over the cut nets encodes minimizing the total volume of communication that will occur during the second phase of the parallel SpGEMM algorithm. An MPI-based parallel SpGEMM library is developed to verify the validity of our models in practice. Parallel runs of the library for a wide range of realistic SpGEMM instances on two large-scale parallel systems JUQUEEN (an IBM BlueGene/Q system) and SuperMUC (an Intel-based cluster) show that the proposed hypergraph models attain high speedup values. © 2014 Society for Industrial and Applied Mathematics.
机译:对于形式为C = A×B的外部乘积并行稀疏矩阵矩阵乘法(SpGEMM),我们提出了三种超图模型,这些模型可同时对输入和输出矩阵进行分区,而无需复制任何输入数据。所有三个超图模型分别对输入矩阵A和B执行一致的一维(1D)列和1D行划分。第一个超图模型执行输出矩阵的二维(2D)基于非零的划分,而第二个和第三个模型分别执行输出矩阵的1D行和1D列划分。这种划分方案引入了两阶段并行SpGEMM算法,其中无通信的本地SpGEMM计算构成第一阶段,而对本地SpGEMM结果的多个单节点累加操作构成第二阶段。在这些模型中,在顶点权重上定义的两个分区约束对并行SpGEMM算法的两个单独阶段中的处理器平衡计算负载进行编码。最小化切割网络上定义的切割大小的划分目标是编码,以最小化将在并行SpGEMM算法第二阶段发生的通信总量。开发了基于MPI的并行SpGEMM库,以在实践中验证我们模型的有效性。在两个大型并行系统JUQUEEN(IBM BlueGene / Q系统)和SuperMUC(基于Intel的集群)上,针对大量实际SpGEMM实例的库并行运行表明,所提出的超图模型具有较高的加速值。 ©2014工业和应用数学学会。

著录项

  • 作者

    Akbudak K., Aykanat C.;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 English
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号